229 research outputs found

    Systematic identification of functional plant modules through the integration of complementary data sources

    Get PDF
    A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation

    Mode and tempo of gene and genome evolution in plants

    Get PDF

    Are we there yet? : reliably estimating the completeness of plant genome sequences

    Get PDF
    Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation

    Enhanced maps of transcription factor binding sites improve regulatory networks learned from accessible chromatin data

    Get PDF
    Determining where transcription factors (TFs) bind in genomes provides insight into which transcriptional programs are active across organs, tissue types, and environmental conditions. Recent advances in high-throughput profiling of regulatory DNA have yielded large amounts of information about chromatin accessibility. Interpreting the functional significance of these data sets requires knowledge of which regulators are likely to bind these regions. This can be achieved by using information about TF-binding preferences, or motifs, to identify TF-binding events that are likely to be functional. Although different approaches exist to map motifs to DNA sequences, a systematic evaluation of these tools in plants is missing. Here, we compare four motif-mapping tools widely used in the Arabidopsis (Arabidopsis thaliana) research community and evaluate their performance using chromatin immunoprecipitation data sets for 40 TFs. Downstream gene regulatory network (GRN) reconstruction was found to be sensitive to the motif mapper used. We further show that the low recall of Find Individual Motif Occurrences, one of the most frequently used motif-mapping tools, can be overcome by using an Ensemble approach, which combines results from different mapping tools. Several examples are provided demonstrating how the Ensemble approach extends our view on transcriptional control for TFs active in different biological processes. Finally, a protocol is presented to effectively derive more complete cell type-specific GRNs through the integrative analysis of open chromatin regions, known binding site information, and expression data sets. This approach will pave the way to increase our understanding of GRNs in different cellular conditions

    TF2Network : predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information

    Get PDF
    A gene regulatory network (GRN) is a collection of regulatory interactions between transcription factors (TFs) and their target genes. GRNs control different biological processes and have been instrumental to understand the organization and complexity of gene regulation. Although various experimental methods have been used to map GRNs in Arabidop-sis thaliana, their limited throughput combined with the large number of TFs makes that for many genes our knowledge about regulating TFs is incomplete. We introduce TF2Network, a tool that exploits the vast amount of TF binding site information and enables the delineation of GRNs by detecting potential regulators for a set of co-expressed or functionally related genes. Validation using two experimental benchmarks reveals that TF2Network predicts the correct regulator in 75-92% of the test sets. Furthermore, our tool is robust to noise in the input gene sets, has a low false discovery rate, and shows a better performance to recover correct regulators compared to other plant tools. TF2Network is accessible through a web interface where GRNs are interactively visualized and annotated with various types of experimental functional information. TF2Network was used to perform systematic functional and regulatory gene annotations, identifying new TFs involved in circadian rhythm and stress response

    Gene space completeness in complex plant genomes

    Get PDF
    Genome annotations offer ample opportunities to study gene functions, biochemical and regulatory pathways, or quantitative trait loci in plants. Determining the quality and completeness of a genome annotation, and maintaining the balance between them, are major challenges, even for genomes of well-studied model organisms. In this review, we present a historical overview of the complexity in different plant genomes and discuss the hurdles and possible solutions in obtaining a complete and high-quality genome annotation. We illustrate there is no clear-cut answer to solve these challenges for different gene types, but provide tips on guiding the iterative process of generating a superior genome annotation, which is a moving target as our knowledge about plant genomics increases and additional data sources become available

    Identification of novel regulatory modules in dicotyledonous plants using expression data and comparative genomics

    Get PDF
    BACKGROUND: Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation. RESULTS: Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be linked to different important biological processes, such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature, with cooperative TFBSs located in close proximity of each other. CONCLUSION: These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view

    TRAPID : an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes

    Get PDF
    Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system
    • …
    corecore